AITopics | transformer-based neural network

Collaborating Authors

transformer-based neural network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Differential Evolution Algorithm based Hyper-Parameters Selection of Transformer Neural Network Model for Load Forecasting

Sen, Anuvab, Mazumder, Arul Rhik, Sen, Udayon

arXiv.org Artificial IntelligenceFeb-4-2024

Accurate load forecasting plays a vital role in numerous sectors, but accurately capturing the complex dynamics of dynamic power systems remains a challenge for traditional statistical models. For these reasons, time-series models (ARIMA) and deep-learning models (ANN, LSTM, GRU, etc.) are commonly deployed and often experience higher success. In this paper, we analyze the efficacy of the recently developed Transformer-based Neural Network model in Load forecasting. Transformer models have the potential to improve Load forecasting because of their ability to learn long-range dependencies derived from their Attention Mechanism. We apply several metaheuristics namely Differential Evolution to find the optimal hyperparameters of the Transformer-based Neural Network to produce accurate forecasts. Differential Evolution provides scalable, robust, global solutions to non-differentiable, multi-objective, or constrained optimization problems. Our work compares the proposed Transformer based Neural Network model integrated with different metaheuristic algorithms by their performance in Load forecasting based on numerical metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). Our findings demonstrate the potential of metaheuristic-enhanced Transformer-based Neural Network models in Load forecasting accuracy and provide optimal hyperparameters for each model.

differential evolution, hyperparameter, neural network, (13 more...)

arXiv.org Artificial Intelligence

2307.15299

Country:

North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.25)
Asia > India (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Tabdoor: Backdoor Vulnerabilities in Transformer-based Neural Networks for Tabular Data

Pleiter, Bart, Tajalli, Behrad, Koffas, Stefanos, Abad, Gorka, Xu, Jing, Larson, Martha, Picek, Stjepan

arXiv.org Artificial IntelligenceJan-29-2024

Deep Neural Networks (DNNs) have shown great promise in various domains. Alongside these developments, vulnerabilities associated with DNN training, such as backdoor attacks, are a significant concern. These attacks involve the subtle insertion of triggers during model training, allowing for manipulated predictions.More recently, DNNs for tabular data have gained increasing attention due to the rise of transformer models. Our research presents a comprehensive analysis of backdoor attacks on tabular data using DNNs, particularly focusing on transformers. Given the inherent complexities of tabular data, we explore the challenges of embedding backdoors. Through systematic experimentation across benchmark datasets, we uncover that transformer-based DNNs for tabular data are highly susceptible to backdoor attacks, even with minimal feature value alterations. We also verify that our attack can be generalized to other models, like XGBoost and DeepFM. Our results indicate nearly perfect attack success rates (approximately 100%) by introducing novel backdoor attack strategies to tabular data. Furthermore, we evaluate several defenses against these attacks, identifying Spectral Signatures as the most effective one. Our findings highlight the urgency of addressing such vulnerabilities and provide insights into potential countermeasures for securing DNN models against backdoors in tabular data.

backdoor attack, dataset, tabular data, (14 more...)

arXiv.org Artificial Intelligence

2311.0755

Country:

North America > United States > District of Columbia > Washington (0.05)
Europe > Netherlands > South Holland > Delft (0.04)
Europe > Netherlands > Gelderland > Nijmegen (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Decode the Surface Code with a Recurrent, Transformer-Based Neural Network

Bausch, Johannes, Senior, Andrew W, Heras, Francisco J H, Edlich, Thomas, Davies, Alex, Newman, Michael, Jones, Cody, Satzinger, Kevin, Niu, Murphy Yuezhen, Blackwell, Sam, Holland, George, Kafri, Dvir, Atalaya, Juan, Gidney, Craig, Hassabis, Demis, Boixo, Sergio, Neven, Hartmut, Kohli, Pushmeet

arXiv.org Artificial IntelligenceOct-9-2023

Quantum error-correction is a prerequisite for reliable quantum computation. Towards this goal, we present a recurrent, transformer-based neural network which learns to decode the surface code, the leading quantum error-correction code. Our decoder outperforms state-of-the-art algorithmic decoders on real-world data from Google's Sycamore quantum processor for distance 3 and 5 surface codes. On distances up to 11, the decoder maintains its advantage on simulated data with realistic noise including cross-talk, leakage, and analog readout signals, and sustains its accuracy far beyond the 25 cycles it was trained on. Our work illustrates the ability of machine learning to go beyond human-designed algorithms by learning from data directly, highlighting machine learning as a strong contender for decoding in quantum computers.

recurrent, surface code, transformer-based neural network, (2 more...)

arXiv.org Artificial Intelligence

2310.059

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.60)

Add feedback

Exploring the Model Behind ChatGPT: How the Bot Works

#artificialintelligenceMar-29-2023, 06:55:10 GMT

ChatGPT is a powerful language model developed by OpenAI, based on the GPT-3.5 architecture. ChatGPT is used in a variety of applications, including chatbots, virtual assistants, and content creation. In this article, we'll take a closer look at the model behind ChatGPT and explore how the bot works. ChatGPT is a deep learning model that uses natural language processing (NLP) to generate text-based responses to user input. It's based on the GPT-3.5 architecture, which is an improvement over the previous GPT-3 architecture. The GPT-3.5 architecture is a transformer-based neural network that uses self-attention mechanisms to process input sequences.

application, architecture, chatgpt, (14 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback